Permuting Data on Random-Access Block Storage

نویسندگان

  • Risi Thonangi
  • Jun Yang
چکیده

Permutation is a fundamental operator for array data, with applications in, for example, changing matrix layouts and reorganizing data cubes. We consider the problem of permuting large quantities of data stored on secondary storage that supports fast random block accesses, such as solid state drives and distributed key-value stores. Faster random accesses open up interesting new opportunities for permutation. While external merge sort has often been used for permutation, it is an overkill that fails to exploit the property of permutation fully and carries unnecessary overhead in storing and comparing keys. We propose faster algorithms with lower memory requirements for a large, useful class of permutations. We also tackle practical challenges that traditional permutation algorithms have not dealt with, such as exploiting random block accesses more aggressively, considering the cost asymmetry between reads and writes, and handling arbitrary data dimension sizes (as opposed to perfect powers often assumed by previous work). As a result, our algorithms are faster and more broadly applicable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Parallel Permutation Algorithms

We investigate the problem of permuting n data items on an EREW PRAM with p processors using little additional storage We present a simple algorithm with run time O n p logn and an improved algorithm with run time O n p logn log log n p Both algorithms require n additional global bits and O local storage per processor If pre x summation is supported at the instruction level the run time of the ...

متن کامل

Parallel Processing Letters Fast Parallel Permutation Algorithms

We investigate the problem of permuting n data items on an EREW PRAM with p processors using little additional storage. We present a simple algorithm with run time O((n/p)logn) and an improved algorithm with run time O(n/p+logn loglog(n/p)). Both algorithms require n additional global bits and O(1) local storage per processor. If preex summation is supported at the instruction level, the run ti...

متن کامل

Block-oriented random access MNOS

MXOS storage elements have been used to realize a blockoriented all electronic secondary memory module. This nonvolatile storage unit offers lO-microsecond data access and reliable error free operation. BORAM modules provide an immediately available cost effective alternative to electromechanical storage in severe environment applications. The text below describes an 18-million-bit advanced dev...

متن کامل

Offline Selective Data Deduplication for Primary Storage Systems

Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunat...

متن کامل

The Disk Drive as an Audio Recorder

The following tutorial paper describes how a block-structured mass storage device, such as a Winchester disk drive, may be made to function as part of a digital audio recording and editing system: the so-called 'tapeless recorder'. It outlines the principles of random access sound file storage and buffering, together with a discussion of digital audio requirements and the historical precedent f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2013